Overview

Dataset statistics

Number of variables29
Number of observations10000
Missing cells19401
Missing cells (%)6.7%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory11.5 MiB
Average record size in memory1.2 KiB

Variable types

NUM13
CAT13
BOOL3

Reproduction

Analysis started2020-11-04 13:32:06.527267
Analysis finished2020-11-04 13:33:10.450852
Versionpandas-profiling v2.6.0
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml
emp_title has a high cardinality: 7902 distinct values High cardinality
Notes has a high cardinality: 6759 distinct values High cardinality
purpose has a high cardinality: 5160 distinct values High cardinality
zip_code has a high cardinality: 720 distinct values High cardinality
emp_title has 592 (5.9%) missing values Missing
Notes has 3230 (32.3%) missing values Missing
mths_since_last_delinq has 6316 (63.2%) missing values Missing
mths_since_last_record has 9160 (91.6%) missing values Missing
emp_length has 250 (2.5%) zeros Zeros
delinq_2yrs has 8910 (89.1%) zeros Zeros
inq_last_6mths has 4602 (46.0%) zeros Zeros
mths_since_last_delinq has 163 (1.6%) zeros Zeros
mths_since_last_record has 267 (2.7%) zeros Zeros
revol_bal has 278 (2.8%) zeros Zeros
revol_util has 254 (2.5%) zeros Zeros

Variables

is_bad
Categorical

Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size78.2 KiB
0
8705
1
 
1295
ValueCountFrequency (%) 
0 8705 87.1%
 
1 1295 13.0%
 

Length

Max length1
Mean length1
Min length1
ValueCountFrequency (%) 
Decimal_Number 2 100.0%
 
ValueCountFrequency (%) 
Common 2 100.0%
 
ValueCountFrequency (%) 
ASCII 2 100.0%
 

emp_title
Categorical

HIGH CARDINALITY
MISSING
Distinct count7902
Unique (%)84.0%
Missing592
Missing (%)5.9%
Memory size78.2 KiB
us army
 
54
bank of america
 
34
ibm
 
23
wells fargo
 
21
self
 
20
Other values (7897)
9256
ValueCountFrequency (%) 
us army 54 0.5%
 
bank of america 34 0.3%
 
ibm 23 0.2%
 
wells fargo 21 0.2%
 
self 20 0.2%
 
us navy 20 0.2%
 
walmart 20 0.2%
 
at&t 18 0.2%
 
united states air force 17 0.2%
 
usaf 17 0.2%
 
Other values (7892) 9164 91.6%
 
(Missing) 592 5.9%
 

Length

Max length78
Mean length17.4252
Min length2
ValueCountFrequency (%) 
Lowercase_Letter 29 46.0%
 
Other_Punctuation 11 17.5%
 
Decimal_Number 10 15.9%
 
Open_Punctuation 2 3.2%
 
Currency_Symbol 1 1.6%
 
Control 1 1.6%
 
Other_Symbol 1 1.6%
 
Modifier_Symbol 1 1.6%
 
Final_Punctuation 1 1.6%
 
Close_Punctuation 1 1.6%
 
Other values (5) 5 7.9%
 
ValueCountFrequency (%) 
Common 34 54.0%
 
Latin 29 46.0%
 
ValueCountFrequency (%) 
ASCII 55 96.5%
 
Punctuation 2 3.5%
 

emp_length
Real number (ℝ≥0)

ZEROS
Distinct count14
Unique (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4.8639
Minimum0
Maximum33
Zeros250
Zeros (%)2.5%
Memory size78.2 KiB

Quantile statistics

Minimum0
5-th percentile1
Q12
median4
Q38
95-th percentile10
Maximum33
Range33
Interquartile range (IQR)6

Descriptive statistics

Standard deviation3.492219402
Coefficient of variation (CV)0.7179875001
Kurtosis-0.7128269405
Mean4.8639
Median Absolute Deviation (MAD)3.0606037
Skewness0.4600047179
Sum48639
Variance12.19559635
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 0. 0.5 1.5 2.5 3.5 ... 6.5 7.5 9.5 10.5 33. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
10 2160 21.6%
 
1 2083 20.8%
 
2 1183 11.8%
 
3 1010 10.1%
 
4 889 8.9%
 
5 779 7.8%
 
6 535 5.3%
 
7 421 4.2%
 
8 351 3.5%
 
9 331 3.3%
 
Other values (4) 258 2.6%
 
ValueCountFrequency (%) 
0 250 2.5%
 
1 2083 20.8%
 
2 1183 11.8%
 
3 1010 10.1%
 
4 889 8.9%
 
ValueCountFrequency (%) 
33 1 < 0.1%
 
22 5 0.1%
 
11 2 < 0.1%
 
10 2160 21.6%
 
9 331 3.3%
 

home_ownership
Categorical

Distinct count5
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size78.2 KiB
rent
4745
mortgage
4445
own
 
775
other
 
34
none
 
1
ValueCountFrequency (%) 
rent 4745 47.4%
 
mortgage 4445 44.5%
 
own 775 7.8%
 
other 34 0.3%
 
none 1 < 0.1%
 

Length

Max length8
Mean length5.7039
Min length3
ValueCountFrequency (%) 
Lowercase_Letter 10 100.0%
 
ValueCountFrequency (%) 
Latin 10 100.0%
 
ValueCountFrequency (%) 
ASCII 10 100.0%
 

annual_inc
Real number (ℝ≥0)

Distinct count1901
Unique (%)19.0%
Missing1
Missing (%)< 0.1%
Infinite0
Infinite (%)0.0%
Mean68203.01154
Minimum2000
Maximum900000
Zeros0
Zeros (%)0.0%
Memory size78.2 KiB

Quantile statistics

Minimum2000
5-th percentile23734
Q140000
median58000
Q382000
95-th percentile143550
Maximum900000
Range898000
Interquartile range (IQR)42000

Descriptive statistics

Standard deviation48590.25276
Coefficient of variation (CV)0.7124355899
Kurtosis51.15309953
Mean68203.01154
Median Absolute Deviation (MAD)30247.19103
Skewness4.880305421
Sum681961912.4
Variance2361012663
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
60000 381 3.8%
 
50000 267 2.7%
 
40000 222 2.2%
 
75000 213 2.1%
 
30000 211 2.1%
 
65000 204 2.0%
 
48000 196 2.0%
 
70000 193 1.9%
 
45000 181 1.8%
 
80000 170 1.7%
 
Other values (1891) 7761 77.6%
 
ValueCountFrequency (%) 
2000 1 < 0.1%
 
4080 1 < 0.1%
 
4200 2 < 0.1%
 
4800 2 < 0.1%
 
5000 2 < 0.1%
 
ValueCountFrequency (%) 
900000 2 < 0.1%
 
860000 1 < 0.1%
 
780000 1 < 0.1%
 
744000 1 < 0.1%
 
725000 1 < 0.1%
 
Distinct count3
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size78.2 KiB
not verified
4367
verified - income
3214
verified - income source
2419
ValueCountFrequency (%) 
not verified 4367 43.7%
 
verified - income 3214 32.1%
 
verified - income source 2419 24.2%
 

Length

Max length24
Mean length16.5098
Min length12
ValueCountFrequency (%) 
Lowercase_Letter 13 86.7%
 
Dash_Punctuation 1 6.7%
 
Space_Separator 1 6.7%
 
ValueCountFrequency (%) 
Latin 13 86.7%
 
Common 2 13.3%
 
ValueCountFrequency (%) 
ASCII 15 100.0%
 

pymnt_plan
Boolean

Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size78.2 KiB
n
9998
y
 
2
ValueCountFrequency (%) 
n 9998 > 99.9%
 
y 2 < 0.1%
 

Notes
Categorical

HIGH CARDINALITY
MISSING
UNIFORM
Distinct count6759
Unique (%)99.8%
Missing3230
Missing (%)32.3%
Memory size78.2 KiB
personal loan
 
4
debt consolidation
 
3
this loan would be to consolidate my credit card debts, and have one payment at a reasonable interest rate.
 
2
credit card debt consolidation
 
2
refinancing
 
2
Other values (6754)
6757
ValueCountFrequency (%) 
personal loan 4 < 0.1%
 
debt consolidation 3 < 0.1%
 
this loan would be to consolidate my credit card debts, and have one payment at a reasonable interest rate. 2 < 0.1%
 
credit card debt consolidation 2 < 0.1%
 
refinancing 2 < 0.1%
 
i am consolidating credit card debt. 2 < 0.1%
 
i am a recent college graduate that is in need to pay down high interest credit card debt. i had to pay my own way through college and have student loans and credit card debt to show for it. i now have a good paying full time job and would like to pay down the high interest credit card debt that i have for a better financial future. 2 < 0.1%
 
camping membership 2 < 0.1%
 
borrower added on 06/30/10 > when will i get my loan in the bank<br/> borrower added on 06/30/10 > thank you<br/> borrower added on 06/30/10 > i what to open up my own business<br/> 1 < 0.1%
 
borrower added on 05/18/10 > pay off credit cards and make some small home improvements.<br/> 1 < 0.1%
 
Other values (6749) 6749 67.5%
 
(Missing) 3230 32.3%
 

Length

Max length3988
Mean length291.4693
Min length3
ValueCountFrequency (%) 
Lowercase_Letter 34 39.1%
 
Other_Punctuation 15 17.2%
 
Decimal_Number 10 11.5%
 
Math_Symbol 6 6.9%
 
Control 4 4.6%
 
Currency_Symbol 3 3.4%
 
Dash_Punctuation 3 3.4%
 
Modifier_Symbol 2 2.3%
 
Open_Punctuation 2 2.3%
 
Initial_Punctuation 2 2.3%
 
Other values (5) 6 6.9%
 
ValueCountFrequency (%) 
Common 53 60.9%
 
Latin 34 39.1%
 
ValueCountFrequency (%) 
ASCII 68 94.4%
 
Punctuation 4 5.6%
 

purpose_cat
Categorical

Distinct count27
Unique (%)0.3%
Missing0
Missing (%)0.0%
Memory size78.2 KiB
debt consolidation
4454
credit card
1273
other
1026
home improvement
 
800
major purchase
 
546
Other values (22)
1901
ValueCountFrequency (%) 
debt consolidation 4454 44.5%
 
credit card 1273 12.7%
 
other 1026 10.3%
 
home improvement 800 8.0%
 
major purchase 546 5.5%
 
small business 461 4.6%
 
car 349 3.5%
 
wedding 250 2.5%
 
medical 183 1.8%
 
moving 159 1.6%
 
Other values (17) 499 5.0%
 

Length

Max length33
Mean length13.9381
Min length3
ValueCountFrequency (%) 
Lowercase_Letter 21 95.5%
 
Space_Separator 1 4.5%
 
ValueCountFrequency (%) 
Latin 21 95.5%
 
Common 1 4.5%
 
ValueCountFrequency (%) 
ASCII 22 100.0%
 

purpose
Categorical

HIGH CARDINALITY
Distinct count5160
Unique (%)51.6%
Missing4
Missing (%)< 0.1%
Memory size78.2 KiB
debt consolidation
 
760
debt consolidation loan
 
464
personal loan
 
231
consolidation
 
181
personal
 
169
Other values (5155)
8191
ValueCountFrequency (%) 
debt consolidation 760 7.6%
 
debt consolidation loan 464 4.6%
 
personal loan 231 2.3%
 
consolidation 181 1.8%
 
personal 169 1.7%
 
home improvement 158 1.6%
 
credit card consolidation 125 1.2%
 
small business loan 100 1.0%
 
loan 85 0.9%
 
consolidation loan 83 0.8%
 
Other values (5150) 7640 76.4%
 

Length

Max length80
Mean length17.1821
Min length1
ValueCountFrequency (%) 
Lowercase_Letter 30 43.5%
 
Other_Punctuation 17 24.6%
 
Decimal_Number 10 14.5%
 
Math_Symbol 3 4.3%
 
Currency_Symbol 2 2.9%
 
Open_Punctuation 2 2.9%
 
Connector_Punctuation 1 1.4%
 
Space_Separator 1 1.4%
 
Modifier_Symbol 1 1.4%
 
Dash_Punctuation 1 1.4%
 
ValueCountFrequency (%) 
Common 39 56.5%
 
Latin 30 43.5%
 
ValueCountFrequency (%) 
ASCII 61 95.3%
 
Punctuation 3 4.7%
 

zip_code
Categorical

HIGH CARDINALITY
Distinct count720
Unique (%)7.2%
Missing0
Missing (%)0.0%
Memory size78.2 KiB
100xx
 
158
112xx
 
141
945xx
 
129
070xx
 
125
606xx
 
114
Other values (715)
9333
ValueCountFrequency (%) 
100xx 158 1.6%
 
112xx 141 1.4%
 
945xx 129 1.3%
 
070xx 125 1.2%
 
606xx 114 1.1%
 
900xx 107 1.1%
 
021xx 99 1.0%
 
941xx 95 0.9%
 
926xx 94 0.9%
 
750xx 93 0.9%
 
Other values (710) 8845 88.4%
 

Length

Max length5
Mean length5
Min length5
ValueCountFrequency (%) 
Decimal_Number 10 90.9%
 
Lowercase_Letter 1 9.1%
 
ValueCountFrequency (%) 
Common 10 90.9%
 
Latin 1 9.1%
 
ValueCountFrequency (%) 
ASCII 11 100.0%
 

addr_state
Categorical

Distinct count50
Unique (%)0.5%
Missing0
Missing (%)0.0%
Memory size78.2 KiB
ca
1748
ny
 
958
fl
 
714
tx
 
700
nj
 
482
Other values (45)
5398
ValueCountFrequency (%) 
ca 1748 17.5%
 
ny 958 9.6%
 
fl 714 7.1%
 
tx 700 7.0%
 
nj 482 4.8%
 
va 392 3.9%
 
il 386 3.9%
 
pa 378 3.8%
 
ga 357 3.6%
 
ma 331 3.3%
 
Other values (40) 3554 35.5%
 

Length

Max length2
Mean length2
Min length2
ValueCountFrequency (%) 
Lowercase_Letter 24 100.0%
 
ValueCountFrequency (%) 
Latin 24 100.0%
 
ValueCountFrequency (%) 
ASCII 24 100.0%
 

debt_to_income
Real number (ℝ≥0)

Distinct count2585
Unique (%)25.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean13.338704
Minimum0
Maximum29.99
Zeros58
Zeros (%)0.6%
Memory size78.2 KiB

Quantile statistics

Minimum0
5-th percentile2.129
Q18.16
median13.41
Q318.6925
95-th percentile23.93
Maximum29.99
Range29.99
Interquartile range (IQR)10.5325

Descriptive statistics

Standard deviation6.754211507
Coefficient of variation (CV)0.5063619004
Kurtosis-0.8546793248
Mean13.338704
Median Absolute Deviation (MAD)5.669516109
Skewness-0.008777611376
Sum133387.04
Variance45.61937308
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 0. 0.055 0.195 3.395 7.675 20.325 22.835 24.965 26.885 29.99 ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 58 0.6%
 
12.48 16 0.2%
 
13.51 13 0.1%
 
10 13 0.1%
 
19.2 13 0.1%
 
18.14 13 0.1%
 
4.8 12 0.1%
 
17.82 12 0.1%
 
15.38 12 0.1%
 
22.43 12 0.1%
 
Other values (2575) 9826 98.3%
 
ValueCountFrequency (%) 
0 58 0.6%
 
0.11 1 < 0.1%
 
0.12 1 < 0.1%
 
0.13 1 < 0.1%
 
0.14 2 < 0.1%
 
ValueCountFrequency (%) 
29.99 1 < 0.1%
 
29.93 1 < 0.1%
 
29.92 1 < 0.1%
 
29.83 1 < 0.1%
 
29.74 1 < 0.1%
 

delinq_2yrs
Real number (ℝ≥0)

ZEROS
Distinct count10
Unique (%)0.1%
Missing5
Missing (%)< 0.1%
Infinite0
Infinite (%)0.0%
Mean0.148174087
Minimum0
Maximum11
Zeros8910
Zeros (%)89.1%
Memory size78.2 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile1
Maximum11
Range11
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.5062698917
Coefficient of variation (CV)3.416723543
Kurtosis54.81013986
Mean0.148174087
Median Absolute Deviation (MAD)0.2641783123
Skewness5.639317112
Sum1481
Variance0.2563092032
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0 8910 89.1%
 
1 822 8.2%
 
2 186 1.9%
 
3 50 0.5%
 
4 14 0.1%
 
5 6 0.1%
 
6 3 < 0.1%
 
7 2 < 0.1%
 
11 1 < 0.1%
 
8 1 < 0.1%
 
(Missing) 5 0.1%
 
ValueCountFrequency (%) 
0 8910 89.1%
 
1 822 8.2%
 
2 186 1.9%
 
3 50 0.5%
 
4 14 0.1%
 
ValueCountFrequency (%) 
11 1 < 0.1%
 
8 1 < 0.1%
 
7 2 < 0.1%
 
6 3 < 0.1%
 
5 6 0.1%
 

inq_last_6mths
Real number (ℝ≥0)

ZEROS
Distinct count20
Unique (%)0.2%
Missing5
Missing (%)< 0.1%
Infinite0
Infinite (%)0.0%
Mean1.066933467
Minimum0
Maximum25
Zeros4602
Zeros (%)46.0%
Memory size78.2 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median1
Q32
95-th percentile4
Maximum25
Range25
Interquartile range (IQR)2

Descriptive statistics

Standard deviation1.47605196
Coefficient of variation (CV)1.383452676
Kurtosis23.67847049
Mean1.066933467
Median Absolute Deviation (MAD)1.01844467
Skewness3.116059024
Sum10664
Variance2.178729389
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0 4602 46.0%
 
1 2684 26.8%
 
2 1431 14.3%
 
3 731 7.3%
 
4 227 2.3%
 
5 152 1.5%
 
6 76 0.8%
 
7 42 0.4%
 
8 27 0.3%
 
9 10 0.1%
 
Other values (10) 13 0.1%
 
(Missing) 5 0.1%
 
ValueCountFrequency (%) 
0 4602 46.0%
 
1 2684 26.8%
 
2 1431 14.3%
 
3 731 7.3%
 
4 227 2.3%
 
ValueCountFrequency (%) 
25 1 < 0.1%
 
24 1 < 0.1%
 
18 2 < 0.1%
 
17 1 < 0.1%
 
16 1 < 0.1%
 

mths_since_last_delinq
Real number (ℝ≥0)

MISSING
ZEROS
Distinct count91
Unique (%)2.5%
Missing6316
Missing (%)63.2%
Infinite0
Infinite (%)0.0%
Mean35.89033659
Minimum0
Maximum120
Zeros163
Zeros (%)1.6%
Memory size78.2 KiB

Quantile statistics

Minimum0
5-th percentile2
Q118
median34
Q353
95-th percentile75
Maximum120
Range120
Interquartile range (IQR)35

Descriptive statistics

Standard deviation22.3614429
Coefficient of variation (CV)0.6230491276
Kurtosis-0.8171447741
Mean35.89033659
Median Absolute Deviation (MAD)18.73116249
Skewness0.2929154934
Sum132220
Variance500.0341287
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0 163 1.6%
 
30 69 0.7%
 
34 66 0.7%
 
38 65 0.7%
 
23 65 0.7%
 
44 64 0.6%
 
24 64 0.6%
 
33 63 0.6%
 
20 63 0.6%
 
18 61 0.6%
 
Other values (81) 2941 29.4%
 
(Missing) 6316 63.2%
 
ValueCountFrequency (%) 
0 163 1.6%
 
1 6 0.1%
 
2 29 0.3%
 
3 40 0.4%
 
4 37 0.4%
 
ValueCountFrequency (%) 
120 1 < 0.1%
 
115 1 < 0.1%
 
97 1 < 0.1%
 
96 1 < 0.1%
 
95 1 < 0.1%
 

mths_since_last_record
Real number (ℝ≥0)

MISSING
ZEROS
Distinct count94
Unique (%)11.2%
Missing9160
Missing (%)91.6%
Infinite0
Infinite (%)0.0%
Mean61.65238095
Minimum0
Maximum119
Zeros267
Zeros (%)2.7%
Memory size78.2 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median86
Q3101
95-th percentile115.05
Maximum119
Range119
Interquartile range (IQR)101

Descriptive statistics

Standard deviation46.18961922
Coefficient of variation (CV)0.7491944108
Kurtosis-1.586694469
Mean61.65238095
Median Absolute Deviation (MAD)42.56052154
Skewness-0.3831245675
Sum51788
Variance2133.480924
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0 267 2.7%
 
89 21 0.2%
 
116 18 0.2%
 
86 17 0.2%
 
87 17 0.2%
 
92 17 0.2%
 
100 16 0.2%
 
114 16 0.2%
 
104 16 0.2%
 
105 15 0.1%
 
Other values (84) 420 4.2%
 
(Missing) 9160 91.6%
 
ValueCountFrequency (%) 
0 267 2.7%
 
6 1 < 0.1%
 
11 1 < 0.1%
 
17 1 < 0.1%
 
20 2 < 0.1%
 
ValueCountFrequency (%) 
119 3 < 0.1%
 
118 11 0.1%
 
117 10 0.1%
 
116 18 0.2%
 
115 10 0.1%
 

open_acc
Real number (ℝ≥0)

Distinct count36
Unique (%)0.4%
Missing5
Missing (%)< 0.1%
Infinite0
Infinite (%)0.0%
Mean9.334567284
Minimum1
Maximum39
Zeros0
Zeros (%)0.0%
Memory size78.2 KiB

Quantile statistics

Minimum1
5-th percentile3
Q16
median9
Q312
95-th percentile18
Maximum39
Range38
Interquartile range (IQR)6

Descriptive statistics

Standard deviation4.526589744
Coefficient of variation (CV)0.4849276465
Kurtosis1.838467994
Mean9.334567284
Median Absolute Deviation (MAD)3.516796938
Skewness1.063599744
Sum93299
Variance20.49001471
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
7 1035 10.3%
 
6 990 9.9%
 
8 937 9.4%
 
9 929 9.3%
 
10 805 8.1%
 
5 763 7.6%
 
11 692 6.9%
 
4 631 6.3%
 
12 577 5.8%
 
13 487 4.9%
 
Other values (26) 2149 21.5%
 
ValueCountFrequency (%) 
1 7 0.1%
 
2 163 1.6%
 
3 374 3.7%
 
4 631 6.3%
 
5 763 7.6%
 
ValueCountFrequency (%) 
39 1 < 0.1%
 
36 2 < 0.1%
 
35 1 < 0.1%
 
33 3 < 0.1%
 
32 1 < 0.1%
 

pub_rec
Categorical

Distinct count4
Unique (%)< 0.1%
Missing5
Missing (%)< 0.1%
Memory size78.2 KiB
0
9422
1
 
550
2
 
18
3
 
5
ValueCountFrequency (%) 
0 9422 94.2%
 
1 550 5.5%
 
2 18 0.2%
 
3 5 0.1%
 
(Missing) 5 0.1%
 

Length

Max length3
Mean length3
Min length3
ValueCountFrequency (%) 
Decimal_Number 4 57.1%
 
Lowercase_Letter 2 28.6%
 
Other_Punctuation 1 14.3%
 
ValueCountFrequency (%) 
Common 5 71.4%
 
Latin 2 28.6%
 
ValueCountFrequency (%) 
ASCII 7 100.0%
 

revol_bal
Real number (ℝ≥0)

ZEROS
Distinct count8130
Unique (%)81.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean14271.0074
Minimum0
Maximum1207359
Zeros278
Zeros (%)2.8%
Memory size78.2 KiB

Quantile statistics

Minimum0
5-th percentile277.95
Q13524.5
median8645.5
Q316952.25
95-th percentile44554.85
Maximum1207359
Range1207359
Interquartile range (IQR)13427.75

Descriptive statistics

Standard deviation25437.9082
Coefficient of variation (CV)1.782488614
Kurtosis570.4140985
Mean14271.0074
Median Absolute Deviation (MAD)11728.71446
Skewness16.32424653
Sum142710074
Variance647087173.7
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[0.000000e+00 5.000000e-01 2.550000e+01 7.825000e+02 5.878000e+03 ... 8.222900e+04 1.205955e+05 1.727790e+05 2.836010e+05 1.207359e+06], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 278 2.8%
 
2227 6 0.1%
 
1763 6 0.1%
 
11628 5 0.1%
 
4801 5 0.1%
 
760 5 0.1%
 
5272 4 < 0.1%
 
18550 4 < 0.1%
 
15 4 < 0.1%
 
5220 4 < 0.1%
 
Other values (8120) 9679 96.8%
 
ValueCountFrequency (%) 
0 278 2.8%
 
1 2 < 0.1%
 
3 2 < 0.1%
 
5 1 < 0.1%
 
6 2 < 0.1%
 
ValueCountFrequency (%) 
1207359 1 < 0.1%
 
602519 1 < 0.1%
 
508961 1 < 0.1%
 
487589 1 < 0.1%
 
423189 1 < 0.1%
 

revol_util
Real number (ℝ≥0)

ZEROS
Distinct count1027
Unique (%)10.3%
Missing26
Missing (%)0.3%
Infinite0
Infinite (%)0.0%
Mean48.450771
Minimum0
Maximum100.6
Zeros254
Zeros (%)2.5%
Memory size78.2 KiB

Quantile statistics

Minimum0
5-th percentile2.8
Q125
median48.7
Q371.8
95-th percentile93.6
Maximum100.6
Range100.6
Interquartile range (IQR)46.8

Descriptive statistics

Standard deviation28.22055724
Coefficient of variation (CV)0.5824583727
Kurtosis-1.099296594
Mean48.450771
Median Absolute Deviation (MAD)24.12794997
Skewness-0.01672374423
Sum483247.99
Variance796.3998507
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0 254 2.5%
 
46.6 21 0.2%
 
43.4 20 0.2%
 
0.1 20 0.2%
 
47.6 19 0.2%
 
56.8 19 0.2%
 
55.4 19 0.2%
 
53.6 19 0.2%
 
70 19 0.2%
 
31.4 18 0.2%
 
Other values (1017) 9546 95.5%
 
(Missing) 26 0.3%
 
ValueCountFrequency (%) 
0 254 2.5%
 
0.03 1 < 0.1%
 
0.1 20 0.2%
 
0.12 1 < 0.1%
 
0.2 11 0.1%
 
ValueCountFrequency (%) 
100.6 1 < 0.1%
 
100 1 < 0.1%
 
99.9 4 < 0.1%
 
99.8 5 0.1%
 
99.7 3 < 0.1%
 

total_acc
Real number (ℝ≥0)

Distinct count75
Unique (%)0.8%
Missing5
Missing (%)< 0.1%
Infinite0
Infinite (%)0.0%
Mean22.01130565
Minimum1
Maximum90
Zeros0
Zeros (%)0.0%
Memory size78.2 KiB

Quantile statistics

Minimum1
5-th percentile6
Q113
median20
Q329
95-th percentile44
Maximum90
Range89
Interquartile range (IQR)16

Descriptive statistics

Standard deviation11.70939957
Coefficient of variation (CV)0.5319720581
Kurtosis0.9238037612
Mean22.01130565
Median Absolute Deviation (MAD)9.292220477
Skewness0.8707976619
Sum220003
Variance137.1100383
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
15 369 3.7%
 
20 360 3.6%
 
17 360 3.6%
 
12 357 3.6%
 
14 351 3.5%
 
19 346 3.5%
 
16 340 3.4%
 
18 339 3.4%
 
13 331 3.3%
 
22 329 3.3%
 
Other values (65) 6513 65.1%
 
ValueCountFrequency (%) 
1 3 < 0.1%
 
2 10 0.1%
 
3 58 0.6%
 
4 115 1.1%
 
5 144 1.4%
 
ValueCountFrequency (%) 
90 1 < 0.1%
 
81 1 < 0.1%
 
80 1 < 0.1%
 
79 1 < 0.1%
 
78 1 < 0.1%
 
Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size78.2 KiB
f
9983
m
 
17
ValueCountFrequency (%) 
f 9983 99.8%
 
m 17 0.2%
 

Length

Max length1
Mean length1
Min length1
ValueCountFrequency (%) 
Lowercase_Letter 2 100.0%
 
ValueCountFrequency (%) 
Latin 2 100.0%
 
ValueCountFrequency (%) 
ASCII 2 100.0%
 
Distinct count1
Unique (%)< 0.1%
Missing32
Missing (%)0.3%
Memory size78.2 KiB
0
9968
(Missing)
 
32
ValueCountFrequency (%) 
0 9968 99.7%
 
(Missing) 32 0.3%
 
Distinct count3
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size78.2 KiB
2
3424
3
3299
1
3277
ValueCountFrequency (%) 
2 3424 34.2%
 
3 3299 33.0%
 
1 3277 32.8%
 

Length

Max length1
Mean length1
Min length1
ValueCountFrequency (%) 
Decimal_Number 3 100.0%
 
ValueCountFrequency (%) 
Common 3 100.0%
 
ValueCountFrequency (%) 
ASCII 3 100.0%
 

policy_code
Categorical

Distinct count5
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size78.2 KiB
pc3
2098
pc5
2025
pc1
1978
pc2
1962
pc4
1937
ValueCountFrequency (%) 
pc3 2098 21.0%
 
pc5 2025 20.2%
 
pc1 1978 19.8%
 
pc2 1962 19.6%
 
pc4 1937 19.4%
 

Length

Max length3
Mean length3
Min length3
ValueCountFrequency (%) 
Decimal_Number 5 71.4%
 
Lowercase_Letter 2 28.6%
 
ValueCountFrequency (%) 
Common 5 71.4%
 
Latin 2 28.6%
 
ValueCountFrequency (%) 
ASCII 7 100.0%
 

cr_line_yrs
Real number (ℝ≥0)

Distinct count50
Unique (%)0.5%
Missing5
Missing (%)< 0.1%
Infinite0
Infinite (%)0.0%
Mean1997.014807
Minimum1970
Maximum2069
Zeros0
Zeros (%)0.0%
Memory size78.2 KiB

Quantile statistics

Minimum1970
5-th percentile1984
Q11994
median1998
Q32001
95-th percentile2006
Maximum2069
Range99
Interquartile range (IQR)7

Descriptive statistics

Standard deviation7.741003471
Coefficient of variation (CV)0.003876287468
Kurtosis21.63770675
Mean1997.014807
Median Absolute Deviation (MAD)5.242684033
Skewness1.78954221
Sum19960163
Variance59.92313473
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
2000 839 8.4%
 
1998 748 7.5%
 
1999 715 7.1%
 
2001 642 6.4%
 
1997 601 6.0%
 
1996 592 5.9%
 
1995 518 5.2%
 
1994 513 5.1%
 
2002 503 5.0%
 
2003 455 4.5%
 
Other values (40) 3869 38.7%
 
ValueCountFrequency (%) 
1970 14 0.1%
 
1971 11 0.1%
 
1972 13 0.1%
 
1973 18 0.2%
 
1974 14 0.1%
 
ValueCountFrequency (%) 
2069 9 0.1%
 
2068 7 0.1%
 
2067 6 0.1%
 
2066 2 < 0.1%
 
2065 2 < 0.1%
 

cr_line_mths
Real number (ℝ≥0)

Distinct count12
Unique (%)0.1%
Missing5
Missing (%)< 0.1%
Infinite0
Infinite (%)0.0%
Mean6.855527764
Minimum1
Maximum12
Zeros0
Zeros (%)0.0%
Memory size78.2 KiB

Quantile statistics

Minimum1
5-th percentile1
Q14
median7
Q310
95-th percentile12
Maximum12
Range11
Interquartile range (IQR)6

Descriptive statistics

Standard deviation3.546390325
Coefficient of variation (CV)0.5173037653
Kurtosis-1.244692025
Mean6.855527764
Median Absolute Deviation (MAD)3.09118289
Skewness-0.1776976186
Sum68521
Variance12.57688434
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
10 1052 10.5%
 
11 999 10.0%
 
12 972 9.7%
 
9 923 9.2%
 
1 904 9.0%
 
8 794 7.9%
 
7 771 7.7%
 
5 740 7.4%
 
6 740 7.4%
 
2 728 7.3%
 
Other values (2) 1372 13.7%
 
ValueCountFrequency (%) 
1 904 9.0%
 
2 728 7.3%
 
3 696 7.0%
 
4 676 6.8%
 
5 740 7.4%
 
ValueCountFrequency (%) 
12 972 9.7%
 
11 999 10.0%
 
10 1052 10.5%
 
9 923 9.2%
 
8 794 7.9%
 
Distinct count1
Unique (%)< 0.1%
Missing5
Missing (%)< 0.1%
Memory size78.2 KiB
1
9995
(Missing)
 
5
ValueCountFrequency (%) 
1 9995 > 99.9%
 
(Missing) 5 0.1%
 

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Missing values

Sample

First rows

is_bademp_titleemp_lengthhome_ownershipannual_incverification_statuspymnt_planNotespurpose_catpurposezip_codeaddr_statedebt_to_incomedelinq_2yrsinq_last_6mthsmths_since_last_delinqmths_since_last_recordopen_accpub_recrevol_balrevol_utiltotal_accinitial_list_statuscollections_12_mths_ex_medmths_since_last_major_derogpolicy_codecr_line_yrscr_line_mthscr_line_days
00time warner cable10mortgage50000.0not verifiednNaNmedicalmedical766xxtx10.870.00.0NaNNaN15.00.01208712.144.0f0.01pc41992.012.01.0
10ottawa university1rent39216.0not verifiednborrower added on 04/14/11 > i will be using this loan to pay off expenses accrued in the last six months on my credit cards, due to a combination of job transition, relocation for the job, and medical expenses from a broken tibula. i generally overpay my monthly minimum on my debts, so i expect that this loan will be repaid sooner than 5 years. i have a steady job working in the information technology field, i've been employed full-time in this field for over eight years, and have been with my present employer for seven months in good standing. my monthly budget breakdown is 1/3 of my paycheck going to rent and bills, 1/3 going to living and job transit expenses, and 1/3 remaining for general spending and payments.<br/>debt consolidationmy debt consolidation loan660xxks9.150.02.0NaNNaN4.00.01011464.05.0f0.02pc12005.011.01.0
20kennedy wilson4rent65000.0not verifiednNaNcredit cardap personal loan916xxca11.240.00.0NaNNaN4.00.0810.68.0f0.03pc41970.06.01.0
30town of plattekill10mortgage57500.0not verifiednNaNdebt consolidationdebt consolidation loan124xxny6.181.00.016.0NaN6.00.01003037.123.0f0.02pc21982.09.01.0
40belmont correctional10mortgage50004.0verified - incomeni want to consolidate my debt, pay for a vacation and buy a ring.debt consolidationconsolidate439xxoh19.030.04.0NaNNaN8.00.01074040.421.0f0.03pc31999.010.01.0
50bae systems4rent47028.0verified - incomenNaNother16-oct-10200xxdc7.832.01.019.0NaN6.00.0171526.425.0f0.03pc31999.012.01.0
60peninsula counseling center10mortgage126000.0not verifiednborrower added on 05/18/10 > mick credit card consolidation loan - 100% payoff of credit card debt - amex, sears, macys and bank of america<br/>credit cardmick credit card loan103xxny14.280.00.0NaNNaN18.00.0546611.129.0f0.03pc11979.011.01.0
70health plan of nevada6mortgage42000.0verified - income sourcenborrower added on 11/29/11 > loan is for debt consolidation and will be paid timely. employed in the healthcare industry for 6 years since moving to nv 7 years ago and have always had stable job positions. thank you very much for your assistance.<br>debt consolidationcc loan891xxnv10.290.00.0NaNNaN9.00.01035495.910.0f0.03pc32006.04.01.0
80john deere2mortgage50000.0verified - incomenNaNdebt consolidationconsolidation612xxil15.360.02.0NaNNaN11.00.01966259.227.0f0.01pc52001.02.01.0
90NaN1rent40000.0not verifiednthis loan would be for a 2006 pt cruiser with only 300 miles on it. there is still a full warranty till dec. 2009 in effect.carfico score 762 want's to buy a new car926xxca6.480.01.0NaNNaN11.00.01999818.323.0f0.01pc51995.05.01.0

Last rows

is_bademp_titleemp_lengthhome_ownershipannual_incverification_statuspymnt_planNotespurpose_catpurposezip_codeaddr_statedebt_to_incomedelinq_2yrsinq_last_6mthsmths_since_last_delinqmths_since_last_recordopen_accpub_recrevol_balrevol_utiltotal_accinitial_list_statuscollections_12_mths_ex_medmths_since_last_major_derogpolicy_codecr_line_yrscr_line_mthscr_line_days
99900konica minolta10mortgage120000.0verified - incomeni am looking ofr a loan so that i can replace my septic system.home improvementhome improvment481xxmi14.441.00.04.0NaN14.00.01471659.831.0f0.02pc21994.02.01.0
99910ametek aerospace and defense10rent63000.0verified - income sourcenborrower added on 07/09/10 > loan app completed<br/>email verified<br/>bank account verified<br/>medicallasik018xxma10.080.00.0NaNNaN6.00.0601.122.0f0.03pc11989.05.01.0
99920the reis group10rent52000.0verified - incomenborrower added on 12/14/11 > looking to be debt free in 3 yrs or less!!<br>debt consolidationconsolidation124xxny23.700.00.070.0NaN8.00.01500291.518.0f0.02pc51998.08.01.0
99930astoria fuel corp.10own95892.0verified - incomeni live in a family owned home. it is my parents, but i am allowed to live here as long as i want as long as i pay for the taxes and any home improvements the home needs. i am looking for a loan to add a bathroom on the second floor and finish other small home improvements the house currently needs. i have lived here for 5 years and have done many updates already. this is the first major renovation i am doing on the house. i need the loan to get what i cannot do myself done the right way. i have excellent credit and pride myself on that.home improvementupdates needed on family owned home110xxny8.700.02.0NaNNaN3.00.0213930.67.0f0.03pc51995.07.01.0
99941guitar center1rent24996.0verified - income sourcenNaNdebt consolidationpersonal loan913xxca3.790.00.0NaNNaN2.00.0480156.57.0f0.01pc12005.08.01.0
99950cabot5mortgage66250.0verified - incomenNaNweddingscottish wedding014xxma9.400.01.0NaNNaN8.00.0365624.110.0f0.02pc32001.09.01.0
99960gallant & wein1rent26000.0verified - income sourcenborrower added on 08/30/11 > credit cards consolidation and doctors bills..<br/>debt consolidationdebt112xxny20.490.01.079.0NaN8.00.0670958.912.0f0.02pc32000.05.01.0
99970weichert, realtors8rent47831.0not verifiednborrower added on 03/10/10 > my dream is to finally end the cycle of revolving debt so that i can finally build a stable future for myself. i've been able to put some hardships behind and can see the light ahead if my loan is fully funded. i live modestly but cannot corral rising life expenses unless i can put away credit debt and service this loan. i will be a worthy lendingclub loan recipient. thank you for your consideration!<br/>debt consolidationharnessing credit debt for a stable future.070xxnj24.130.00.0NaN111.09.01.01134660.717.0f0.03pc31989.012.01.0
99980meadwestvaco6mortgage70000.0not verifiednNaNmajor purchasepersonal244xxva16.182.02.016.0NaN9.00.01715750.927.0f0.02pc31999.03.01.0
99990rehab alliance1rent70560.0not verifiednborrower added on 11/09/11 > order to pay back lenders quicker. also, never been late on a payment. job: very stable, full-time job (40 hours/wk). thank you!<br>credit cardcredit card loan900xxca16.130.01.053.0NaN15.00.0230422.634.0f0.02pc52000.09.01.0